May 27, 2015
Concern from the stats community
Former Googler Rachel Schutt taught a similar Data Science class course at Columbia University.
Only intro stats and some exposure to R
18 students, mostly juniors and seniors.
| Major | Count |
|---|---|
| Mathematics | 4 |
| Biological Science: Biology & Biochem and Molecular Biology | 4 |
| Other Science: Chemistry, Environmental Studies, Physics | 4 |
| Social Science: Political Science, Sociology | 2 |
| Economics | 2 |
| Misc: Psychology, Linguistics | 2 |
ASA's GAISE Reports
dplyr package for data wrangling/manipulationggplot2 package for data visualizationData manipulation via the following verbs on tidy data:
filter: keep observations matching criteriasummarise: reduce many values to onemutate: create new variables from existing onesarrange: reorder rowsselect: pick columns by namejoin: join two data setsgroup_by: group subsets of observations togetherA statistical graphic consists of a mapping of data variables to aesthetic attributes of geometric objects that we can observe.
ggplot2 allows us to construct graphics in a modular fashion by specifying these components.
| Data (Variable) | Aesthetic | Geometric Object |
|---|---|---|
| longitude | x position | points |
| latitude | y position | points |
| army size | size = width | bars |
| army direction | color = brown or black | bars |
| date | (x,y) position | text |
| temperature | (x,y) position | lines |
Domestic flights leaving Houston airport (IAH) in 2011. Four data sets:
flights: info on all 227,496 flightsweather: hourly weather infoplanes: information on all 2853 airplanesairports: information on all 3376 destination airportsBest predictors have distinct differences (in gender) in large segments of the population.
All 222,540 songs played on the Reed poolhall room jukebox from 2003-2009.
| date_time | artist | album | track |
|---|---|---|---|
| Sun Dec 7 05:12:57 2003 | Tom Petty and the Heartbreakers | Into the Great Wide Open | |
| Sun Dec 7 05:15:56 2003 | Jefferson Airplane | Somebody To Love | |
| Sun Dec 7 05:23:04 2003 | Led Zeppelin | Led Zeppelin IV | 08 When The Levee Breaks |
quandl.com has a great R interface
This is the only stats class many will take.
Presentation on 2011/06/27 given by Dierdre and Amir: